Monolingual and Bilingual Concept Visualization from Corpora
نویسندگان
چکیده
As well as identifying relevant information, a successful information management system must be able to present its findings in terms which are familiar to the user, which is especially challenging when the incoming information is in a foreign language (Levow et al., 2001). We demonstrate techniques which attempt to address this challenge by placing terms in an abstract ‘information space’ based on their occurrences in text corpora, and then allowing a user to visualize local regions of this information space. Words are plotted in a 2-dimensional picture so that related words are close together and whole classes of similar words occur in recognizable clusters which sometimes clearly signify a particular meaning. As well as giving a clear view of which concepts are related in a particular document collection, this technique also helps a user to interpret unknown words.
منابع مشابه
Metalinguistic Awareness and Bilingual vs. Monolingual EFL Learners: Evidence from a Diagonal Bilingual Context
This paper reports a study of 85 Iranian EFL learners in the English Language Department of Urmia University. It explores the possible differences between performance of 38 Persian monolingual and 47 Turkish-Persian bilingual EFL learners on metalinguistic tasks of ungrammatical structures and translation. The underlying hypothesis is that bilinguals in diagonal bilingual contexts experience a ...
متن کاملSynonymous Collocation Extraction Using Translation Information
Automatically acquiring synonymous collocation pairs such as and from corpora is a challenging task. For this task, we can, in general, have a large monolingual corpus and/or a very limited bilingual corpus. Methods that use monolingual corpora alone or use bilingual corpora alone are apparently inadequate because of low precision or low coverage. I...
متن کاملLearning Crosslingual Word Embeddings without Bilingual Corpora
Crosslingual word embeddings represent lexical items from different languages in the same vector space, enabling transfer of NLP tools. However, previous attempts had expensive resource requirements, difficulty incorporating monolingual data or were unable to handle polysemy. We address these drawbacks in our method which takes advantage of a high coverage dictionary in an EM style training alg...
متن کاملTowards producing bilingual lexica from monolingual corpora
Bilingual lexica are the basis for many cross-lingual natural language processing tasks. Recent works have shown success in learning bilingual dictionary by taking advantages of comparable corpora and a diverse set of signals derived from monolingual corpora. In the present work, we describe an approach to automatically learn bilingual lexica by training a supervised classifier using word embed...
متن کاملLearning Bilingual Lexicons from Monolingual Corpora
We present a method for learning bilingual translation lexicons from monolingual corpora. Word types in each language are characterized by purely monolingual features, such as context counts and orthographic substrings. Translations are induced using a generative model based on canonical correlation analysis, which explains the monolingual lexicons in terms of latent matchings. We show that hig...
متن کاملBuilding bilingual terminologies from comparable corpora: the TTC TermSuite
In this paper, we exploit domain-specific comparable corpora to build bilingual terminologies. We present the monolingual term extraction and the bilingual alignment that will allow us to identify and translate high specialised terminology. We stress the huge importance of taking into account both simple and complex terms in a multilingual environment. Such linguistic diversity implies to combi...
متن کامل